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Abstract 

Background: Plasmodium falciparum is the most malignant agent of human malaria. It belongs to the taxon 
Laverania, which includes other ape-infecting Plasmodium species. The origin of the Laverania is still debated. P. 
falciparum exports pathogenicity-related proteins into the host cell using the Plasmodium export element (PEXEL). 
Predictions based on the presence of a PEXEL motif suggest that more than 300 proteins are exported by P. 
falciparum, while there are many fewer exported proteins in non-Laverania. 

Results: A whole-genome approach was applied to resolve the phylogeny of eight Plasmodium species and four 
outgroup taxa. By using 218 orthologous proteins we received unanimous support for a sister group position of 
Laverania and avian malaria parasites. This observation was corroborated by the analyses of 28 exported proteins 
with orthologs present in all Plasmodium species. Most interestingly, several deviations from the P. falciparum 
PEXEL motif were found to be present in the orthologous sequences of non-Laverania. 

Conclusion: Our phylogenomic analyses strongly support the hypotheses that the Laverania have been founded 
by a single Plasmodium species switching from birds to African great apes or vice versa. The deviations from the 
canonical PEXEL motif in orthologs may explain the comparably low number of exported proteins that have been 
predicted in non-Laverania. 



Background 

Malaria is one of the most common infectious diseases, 
putting about two billion humans at risk and resulting 
in about one million fatalities each year [1]. Malaria is 
caused by protozoan parasites of the genus Plasmodium 
(Haemosporidae; Apicomplexa). Species of this genus 
undergo a complex life cycle including an asexual prolif- 
eration phase in the erythrocytes of vertebrate hosts. 

Although hundreds of Plasmodium species are cur- 
rently known, only few infect humans. In moderate cli- 
mate zones, human malaria infection is largely due to P. 
vivax, but the life-threatening form of this disease is 
almost exclusively caused by P. falciparum. About 60 
years ago, the high pathogenicity of P. falciparum led to 
the proposal that this parasite may be a rather recent 
acquisition from a non-human host [2]. Since then, it 
has become evident that P. falciparum indeed is closely 
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related to other Plasmodium species from African great 
apes [3,4]. Together they constitute the subgenus Laver- 
ania and several reciprocal host switches have occurred 
during the evolution of this group of malaria parasites 
[5-9]. 

The evolutionary ancestry of P. falciparum and the 
other Laverania is still a matter of debate. Until now, it 
has not been conclusively agreed on whether this subge- 
nus is more closely related to other mammalian malaria 
parasites or whether it shares a common ancestry with 
bird-infecting Plasmodium species (reviewed in [10]). 
Most molecular phylogenetic studies of the genus Plas- 
modium are based on the analysis of single proteins 
such as cytochrome b oxidase, adenylosuccinate lyase, 
and caseinolytic protease C [10]. While these proteins 
contain sufficient phylogenetic information to resolve 
the relationships within the Laverania, multiple substitu- 
tions per site (homoplasy) limit their utility at a deeper 
phylogenetic level [10]. 

Upon invasion by P. falciparum, erythrocytes are sub- 
jected to an extensive remodeling process resulting in 
altered mechanical and adhesive properties [11]. 
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Prominent examples include the formation of cytoadher- 
ence knobs at the erythrocyte membrane and the asso- 
ciated exposure of PfEMPl {P. falciparum erythrocyte 
membrane protein) at the surface of the infected cell 
[12]. Plasmodium proteins involved in this remodeling 
process have to pass the parasitophorous vacuole mem- 
brane (PVM) on their way from the parasite into the 
erythrocyte; most of these proteins are characterized by 
a hydrophobic signal sequence for targeting the protein 
to the endoplasmic reticulum (ER) and a sequence motif 
(RxLxE/Q/D) either referred to as Plasmodium export 
element (PEXEL; [13]) or vacuolar transport signal [14]. 

The PEXEL motif is cleaved by the aspartyl protease 
plasmepsin V in the ER of the parasite [15-17] and the 
nascent protein is released into the parasitophorous 
vacuole. From there it is transported through the PVM 
into the host cell by the Plasmodium translocon of 
exported proteins (PTEX; [18]). Predictions based on 
the presence of the PEXEL suggest that more than 300 
P. falciparum proteins are exported into the host cell 
[13,14,19,20]. Notably, PfEMPls are structurally differ- 
ent, having an export element that precedes the signal 
sequence (R/KxL/V/MxE/D; cf. [13]). This export ele- 
ment appears to be necessary for export [13] but is not 
cleaved in vivo, and therefore might be functionally dis- 
tinct [21]. 

The conservation of plasmepsin V and the compo- 
nents of PTEX throughout the genus Plasmodium indi- 
cates that the same protein export machinery is used by 
all Plasmodium species [16-18]. In addition, PEXEL 
sequences from P. falciparum proteins proved to be 
functional in rodent malaria parasites [13] and vice 
versa [22]. Thus, in principle, the screens to detect 
exported proteins in P. falciparum should be extendable 
to other Plasmodium species. However, surprisingly few 
proteins have been detected outside of the Laverania 
using the P. falciparum PEXEL motif, and it has been 
suggested that these species export substantially fewer 
proteins into the host cell than P. falciparum [13,14,19]. 

Non-Laverania, however, also induce elaborate mor- 
phological changes in their host cells and the low num- 
ber of predicted exported proteins may argue for a 
prominent role of PEXEL-negative exported proteins 
(PNEPs; reviewed in [23]). An additional, thus far unex- 
plored, explanation could be a slightly different consen- 
sus of the PEXEL motif in Plasmodium taxa other than 
Laverania that could hamper the prediction of these 
proteins. This would inevitably lead to an underestima- 
tion of the respective exportomes. 

Here, we took advantage of the available genomic 
sequences from eight Plasmodium species and four 
other apicomplexan species. Orthologous proteins were 
identified and used (i) to reconstruct the phylogeny of 
these species, (ii) to obtain a set of exported P. 



falciparum proteins that are conserved throughout Plas- 
modium evolution, and (iii) to investigate the evolution- 
ary plasticity of the corresponding Plasmodium export 
elements. 

Methods 

Source of sequence data 

The genomic sequences of P. falciparum 3D7 [24], P. 
yoelii 17XNL [25], P. berghei Anka [26], P. chabaudi AS 
[26], P. knowlesi H [27], P. vivax Sal-1 [28], as well as P. 
reichenowi and P. gallinaceum (both unpublished data 
produced by the Wellcome Trust Sanger Institute; used 
with permission) were obtained from PlasmoDB v. 6.1 
[29]; sequences of Toxoplasma gondii ME49 were 
obtained from ToxoDB v. 5.2 [30], sequences of Babesia 
bovis T2Bo from Integr8 [31], sequences of Cryptospori- 
dium parvum Iowa from CryptoDB [32]; sequences of 
Theileria annulata Ankara [33] were downloaded from 
the Sanger Institute http://www.sanger.ac.uk/resources/ 
downloads/ protozoa/ . 

Collection of orthologs 

The dataset of orthologous proteins for phylogeny 
reconstructions was compiled as described before [34]. 
In brief, InParanoid-TC was used with P. falciparum, P. 
vivax, P. knowlesi, P. yoelii, P. berghei, P. chabaudi, T. 
gondii, and B. bovis as primer taxa. For 921 proteins 
orthologs were present in all eight primer taxa. These 
921 core orthologs served then as input for HaMStR to 
search for the corresponding proteins in P. reichenowi, 
P. gallinaceum, C. parvum, and T. annulata. Following 
search species - reference species pairs were used in the 
HaMStR search: P. reichenowi - P. falciparum, P. galli- 
naceum - P. falciparum, C. parvum - T. gondii, and T. 
annulata - B. bovis. HaMStR could extend 218 core 
orthologs with sequences from all four species such that 
each ortholog group consisted of twelve sequences. The 
amino acid sequences for each of the 218 core orthologs 
were aligned with MAFFT [35] using the options -max- 
iterate 1000 and -localpair. The 218 single alignments 
were concatenated to form a super-alignment with 
192,102 aa positions. This super-alignment was pro- 
cessed twice: (i) positions for which less than half of the 
sequences were represented by an amino acid were 
removed, and (ii) Gblocks 0.91b [36] was applied using 
the following parameters: -minimum number of 
sequences for a conserved position was set to 7; -mini- 
mum number of sequences for a flanking position was set 
to 10; -maximum number of contiguous nonconserved 
positions was set to 4; -minimum length of a block was 
set to 10; and -allowed gap positions was set to none. 

To obtain the collection of exported proteins that 
have functionally equivalent orthologs in the other Plas- 
modium species, the two most comprehensive 
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predictions of exported P. falciparum proteins were 
used [19,20]. These predictions contain 396 and 422 
proteins (not including the structurally distinct 
PfEMPls), respectively; the combination of both resulted 
in a non-redundant set of 531 putatively exported pro- 
teins. Each protein was used as query for a tBLASTn 
search against the P. falciparum genome. Proteins, for 
which the E value of the best BLAST hit (not consider- 
ing the hit against itself) was larger than 10' , were 
considered to have no paralogs present in P. falciparum 
and were used for further analysis. For each paralog-free 
protein a reciprocal tBLASTn search was performed to 
identify candidate orthologs in the other Plasmodium 
species (E value cut-off: < 10" 10 ). Proteins with a single 
ortholog present in each of the eight Plasmodium spe- 
cies were aligned with MAFFT [35] using the options 
-maxiterate 1000 and -localpair. 

Phylogeny reconstruction 

Maximum likelihood (ML) trees were reconstructed 
with RAxML v. 7.2.2 [37] using the WAG model [38] of 
amino acid sequence evolution with empirical amino 
acid frequencies (option F). Substitution rate heteroge- 
neity was modeled using a gamma distribution, allowing 
for a fraction of invariant sites (option GAMMAP). Baye- 
sian tree search was performed with PhyloBayes v. 3.2 
[39] using the WAG model. Four MCMC chains were 
run for 10,000 cycles. Every 10 th cycle was sampled and 
convergence of the chains was pair-wise checked with 
bpcomp allowing for a burn-in of 1,000 cycles. Increas- 
ing the burn-in or usage of other models of amino acid 
sequence evolution such as the CAT [40] or LG model 
[41] did not change the results (not shown). 

Testing of alternative phytogenies 

The small number of taxa in our study allows the eva- 
luation of every possible tree topology. However, we 
reduced the number of tested trees by imposing the fol- 
lowing constraints: monophyly of the genus Plasmo- 
dium; monophyly of B. bovis and T. annulata 
(Piroplasmida); monophyly of T. gondii and C. parvum 
(Eimeriorina); monophyly of P. yoelii and P. berghei; 
monophyly of P. falciparum and P. reichenowi; mono- 
phyly of P. vivax and P. knowlesi. Note that all seven 
constraints represent accepted evolutionary relationships 
(see references in Table 1) except the monophyly of T. 
gondii and C. parvum [42], and have been confirmed by 
our unrestricted heuristic tree searches. We computed 
the likelihood of the resulting 105 alternative tree topol- 
ogies with TREE-PUZZLE v. 5.2.pl21.1 [43] using the 
WAG model of sequence evolution. Substitution rate 
heterogeneity was modeled with a gamma distribution 
assuming four rate categories and empirical amino acid 
frequencies. Hypothesis testing was performed using the 



routines provided by TREE-PUZZLE and by CONSEL 
[44]. 

Sequence analysis 

Pairwise amino acid identities and similarities were cal- 
culated with GeneDoc v. 2.6 [45] using the Blosum 62 
model. PEXEL sequences of the P. falciparum proteins 
were identified via a match to the published consensus 
sequences [13,14,19,20]. The putative PEXEL sequences 
of proteins from other Plasmodium species were 
extracted by aligning these proteins to their ortholog in 
P. falciparum; we then used the homologous amino acid 
positions to the P. falciparum PEXEL as candidate 
export elements in these species. PEXEL sequences from 
the individual proteins were aligned separately for each 
species by hand and the corresponding PEXEL motifs 
were generated with WebLogo [46]. Presence of hydro- 
phobic signal sequences was assessed using SignalP v. 
3.0 [47]. 

Results and Discussion 

Evolutionary ancestry of P. falciparum and other 
Laverania 

We extracted the genomic sequences of eight Plasmo- 
dium species {P. falciparum, P. reichenowi, P. vivax, P. 
knowlesi, P. gallinaceum, P. chabaudi, P. yoelii, and P. 
berghei) and four additional apicomplexan species (T. 
gondii, C. parvum, T. annulata, and B. bovis) from pub- 
lic databases. HaMStR, a Hidden Markov Model based 
tool [34], was used to identify 218 proteins with ortho- 
logs in all twelve species (Additional file 1). This num- 
ber is similar to that used in a recent phylogenomic 
study of eight apicomplexan species, including two spe- 
cies from the genus Plasmodium [48]. 

The single alignments of the 218 proteins were conca- 
tenated and positions for which less than half of the 
taxa were represented by an amino acid were removed. 
This resulted in a super-alignment with 135,360 aa posi- 
tions (Additional file 2), which was used for initial maxi- 
mum likelihood (ML) and Bayesian tree reconstructions. 
While tree topologies inferred from the ML analysis 
were identical with those obtained in later analyses (see 
below; Figure 1), MCMC chains did not converge on a 
single topology, indicating that the dataset includes con- 
flicting phylogenetic information. Therefore, the 135,360 
aa alignment was further processed using Gblocks [36]. 
This procedure has been demonstrated to improve phy- 
logenetic analyses by reducing the impact of misaligned 
regions (due to very high sequence divergence) and 
homoplasy (due to sequence saturation) [49]. We 
obtained a final alignment comprising 49,521 aa posi- 
tions and no missing data (Additional file 3). In Bayesian 
analysis, MCMC chains readily converged on the same 
topology (maxdiff: 0; meandiff: 0). In the resulting 
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Table 1 Molecular phylogenetic analyses attempting to resolve the relationships among malaria parasites 



Reference 


Gene* 


Outgroup taxa 


Number of ingroup 
taxa 


Laverania + avian malaria 
parasites 


Waters et al. 1991 [53] 


SSU rRNA 


Acanthamoeba 
castellani 


10 


Yes 


Escalante and Ayala 1994 
[3] 


SSU rRNA 


Theileria annulata 
Babesia bovis 
Sarcocystis fusiformis 


I I 


No 


Escalante and Ayala 1995 
[56] 


SSU rRNA 


Colpidium campylum 

Euplotes aediculatus 

Glaucoma chattoni 

Opisthonecta 

henneguyi 

Paramecium 

tetraurelia 


5 


Yes 


Qari et al. 1996 
[68] 


SSU rRNA 


Toxoplasma gondii, 

Paramecium 

tetraurelia 


13 


No 


Escalante et al. 1998 [69] 


Cyt b 


Toxoplasma gondii 


17 


No 


Rathore et al. 2001 [70] 


SSU rRNA 


Toxoplasma gondii 


8 


Yes 




Cyt b 


Toxoplasma gondii 


8 


No 




CIpC 


Toxoplasma gondii 


9 


No 


Perkins and Schall 2002 
[58] 


Cyt b 


Theileria annulata 

Leucocytozoon 

dubreuli 

Leucocytozoon 

simondi 


52 


No 


Kissinger et al. 2002 [57] 


SSU rRNA 


Theileria annulata 
Babesia equis 


10 


Yes 


Leclerc et al. 2004 
[59] 


SSU rRNA 


Toxoplasma gondii 
Sarcocystis fusiformis 
Babesia bovis 


21 


No 


Roy and Irimia 2008 
[60] 


SSU rRNA 


Leucocytozoon 
caulleryi 

LcULULyiUZUU! 1 

sabrazesi 


18 


No 


Martinsen et al. 2008 [61] 


Cyt b, Cox I, CIpC, ASL 
(concatenated) 


Leucocytozoon spp. 


57 


No 


Ollomo et al. 2009 [4] 


Cyt b, Cox I, Cox III (concatenated) 


Leucocytozoon 
caulleryi 


17 


No 


Krief et al. 2010 [7] 


Dhfr-ts, Msp2 (concatenated) 


Leucocytozoon 
sabrazesi 


42 


No 


Silva et al. 2010 [53] 


29 proteins 
(concatenated) 


Theileria annulata 
Theileria annulata 


8 


No 



Note that only analyses are listed that included non-Plasmodium species as an outgroup taxa 

^Abbreviations: SSU rRNA, 18S small subunit ribosomal RNA; Cyt b, cytochrome b; CIpC, caseinolytic protease C; Cox I and 111, cytochrome oxidase I and III; ASL, 
adenylosuccinate lyase; Dhfr-ts, dihydrofolate reductase-thymidylate synthase; Msp2, merozoite surface protein 2. 



consensus tree all nodes received strong support and the 
same topology was obtained in ML analyses (Figure 1). 

The phylogenetic analyses show that the eight Plasmo- 
dium species form a monophyletic clade (100% boot- 
strap support and 1.00 Bayesian posterior probabilities). 
The malaria parasites from rodents (P. chabaudi, P. yoe- 
lii, and P. berghei) are clearly separated from those 
infecting birds and primates (100% bootstrap support 
and 1.00 Bayesian posterior probabilities). Notably, the 
Laverania {P. falciparum and P. reichenowi) do not 
group with the other primate-infecting malaria parasites, 
but form a well-supported clade with P. gallinaceum 



(99% bootstrap support and 1.00 Bayesian posterior 
probabilities). 

ML as well as Bayesian methods return only the best 
tree and thus provide no information on other tree 
topologies with likelihoods that may not be significantly 
worse. To address this issue, 104 alternative tree topolo- 
gies were tested by inferring their expected likelihood 
weights (ELW; [50]) and their probabilities in the 
approximately unbiased (AU) test [51]. All alternative 
tree topologies (including those with P. gallinaceum 
being the sister group of mammal Plasmodium species) 
were rejected with high confidence (Figure 2; Additional 
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K 



hi: 



Toxoplasma gondii 

Cryptosporidium parvum 

Babesia bovis 

Theileria annulata 

p_ falciparum (primate malaria parasife) 

p reichenowi (primate malaria parasite) 

p gallinaceum (avian malaria parasite) 

p IffjOwleSI (primate malaria parasite) 

p yjygx (primate malaria parasite) 

p_ chabaudi (rodent malaria parasite) 

p yoeiii (rodent malaria parasite) 

p berghei (rodent malaria parasite) 

Figure 1 Phylogenetic relationships of eight malaria parasites 

Phylogenetic tree reconstructions were based on 218 proteins. The 
single alignments were concatenated to form a super-alignment 
and problematic alignment regions were subsequently removed 
with Gblocks. This procedure resulted in an alignment with 49,521 
aa positions (no missing data), which was used for ML tree 
reconstruction and Bayesian tree search. T. gondii, C. parvum, J. 
annulata, and B. bovis were used as an outgroup to root the tree. 
Only the ML tree is displayed, but the topology of the Bayesian tree 
was identical. Numbers at the nodes denote bootstrap support 
values (left) and Bayesian posterior probabilities (right). The scale bar 
equals 0.05 expected substitutions per site. See also Additional files 
1, 2 and 3. 



Outgroup position A 

P. falciparum 
P. reichenowi 
P gallinaceum 

■ P. knowlesi 

' P. vivax 

P chabaudi 
P. yoeiii 
P. berghei 



Outgroup position C 

I — P. falciparum 
' — P. reichenowi 

P. gallinaceum 

I — P. knowlesi 
' — P. vivax 
P. chabaudi 
P. yoeiii 
P. berghei 



c-ELW: 0.991; AU: 0.994 



c-ELW: 0.000; AU: 0.007 



■c 



Outgroup position B 

P. knowlesi 
P vivax 
P. gallinaceum 
P. falciparum 
P. reichenowi 
P. chabaudi 
P. yoeiii 
P. berghei 



Outgroup position D 

P. gallinaceum 

P. falciparum 
P. reichenowi 
P. knowlesi 
P. vivax 
P. chabaudi 
P. yoeiii 
P. berghei 



c-ELW: 0.002; AU: 0.001 



c-ELW: 0.006; AU: 0.013 



Figure 2 Alternative relationships among major Plasmodium 
lineages resulting from different root placements. The results 
for the individual likelihood ratio tests are given below each tree. 
ELW, expected likelihood weights [50]; AU, approximately unbiased 
test [51]. T. gondii, C. parvum, T. annulata, and B. bovis were used as 
an outgroup. Note that only the results from analyses with rate 
heterogeneity are shown; results from analyses without rate 
heterogeneity were essentially the same. See also Additional file 4. 



file 4). Thus, the position of P. gallinaceum as sister 
group to the Laverania receives unambiguous support 
from the data. 

Until now, two other whole-genome approaches 
attempted to resolve the evolutionary relationships of 
the eight Plasmodium species. Davalos and Perkins [52] 
based their analyses on a set of 104 proteins (~26,000 aa 
positions), recovering the same topology among Plasmo- 
dium species as displayed in Figure 1. However, no out- 
group taxa were included to root the tree, and thus no 
information on the evolutionary ancestry of the Lavera- 
nia could be provided. Silva et al. [53], on the other 
hand, based their analyses on a set of 29 proteins 
(~12,000 aa positions) and used two species from the 
genus Theileria to root the tree. While they proposed 
the monophyly of mammalian Plasmodium species, 
some of their results supported a grouping of P. gallina- 
ceum and the Laverania. 

Davalos and Perkins [52] as well as Silva et al. [53] 
both used slow-evolving proteins for their phylogenetic 
inferences. To assess the effect of the evolutionary rate 
on our analysis, we partitioned the 218 proteins of our 
dataset. We first computed a ML tree for each protein 
individually. The length of this protein tree, i.e. the sum 
of its branch lengths, then served as an approximation 
for the evolutionary rate (Figure 3; see also Additional 
file 5). Subsequently, the proteins were categorized into 
three subsets according to their tree lengths. Dataset 1 




111 0 



-i — 

10 



1 1 1 — 

2 4 6 8 

Sum of expected substitutions per site 

Figure 3 Evolutionary rates of the 218 proteins used for 
phylogenetic inferences. The length of the individual protein 
trees, i.e. the sum of its branch lengths, served as an approximation 
for the evolutionary rate. Note that only the proteins of primer taxa 
[P. falciparum, P. vivax, P. knowlesi, P. yoeiii, P. berghei, P. chabaudi, J. 
gondii, and B. bow's) were considered. See also Additional file 5. 
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comprised 65 slow-evolving proteins (tree lengths of less 
than two expected substitutions per site). Dataset 2 
comprised 88 proteins evolving at an intermediate speed 
(tree lengths of two or more but less than four expected 
substitutions per site). Dataset 3 comprised the 66 fast- 
evolving proteins (tree length of four or more expected 
substitutions per site). For subsequent tree reconstruc- 
tion, 65 proteins were randomly chosen from each parti- 
tion such that the same number of proteins was used 
for each dataset. The individual alignments were conca- 
tenated, processed with Gblocks as described and used 
for ML tree reconstruction (Figure 4). All three datasets 
agree in placing P. gallinaceum as sister of the Lavera- 
nia. The topology of the tree was thus identical with 



B. Proteins evolving at an intermediate speed 



j— T 

A 98 



- P. knowlesi 

■ P. vivax 

- P. berghei 

■ P. chabaudi 

■ P. yoelii 



■ T. gondii 

- C. parvum 

- B. bovis 

■ T. annulata 

■ P. falciparum 

- P. reichenowi 

- P. gallinaceum 

- P. knowlesi 
• P. vivax 

■ P. berghei 

■ P. chabaudi 

- P. yoelii 



C. Fast-evolving proteins 




til 



T. gondii 

00 

C. parvum 

■ B. bovis 

100 

1 T. annulata 

P falciparum 

i 

P. reichenowi 

P. gallinaceum 

P. knowlesi 

P. vivax 

P. berghei 

P. chabaudi 

P. yoelii 

Figure 4 Phylogenetic relationships inferred using three 
subsets of proteins with varying evolutionary rates The 

proteins used for the initial phylogenetic inference were categorized 
into three subsets each comprising 65 proteins: (A) Slow-evolving 
proteins (tree lengths of less than two expected substitutions per 
site); 13,670 aa of 50,332 aa (51%) were used for ML inference. (B) 
Proteins evolving at an intermediate speed (tree lengths of two or 
more but less than four expected substitutions per site); 14,967 aa 
of 27,063 aa (30%) were used for ML inference. (C) Fast-evolving- 
proteins (tree lengths of four or more expected substitutions per 
site); 12,021 aa of 96,774 aa (12%) were used for ML inference. T. 
gondii, C. parvum, T. annulata, and S. bovis were used to root the 
trees. Numbers at the nodes denote bootstrap support values; scale 
bars are equal to 0.05 expected substitutions per site. 



that inferred from the complete dataset (Figure 1). We 
conclude that our reconstruction of the Plasmodium 
phylogeny does not depend on the evolutionary rates of 
the proteins used for the phylogeny. 

The bootstrap support for the clade consisting of P. 
gallinaceum and Laverania was maximal for the dataset 
comprising proteins evolving at an intermediate speed 
(98%) and minimal for the dataset comprising the fast- 
evolving proteins (76%). The branch leading to the clade 
consisting of P. gallinaceum and Laverania was short 
(~0.02 expected substitutions per site; cf. Figure 4). 
When using fast-evolving proteins, multiple substitu- 
tions in the dataset might confound the phylogenetic 
signal leading to artifacts due to long branch attraction 
[54]. On the other hand, using only slow-evolving pro- 
teins is likely to result in a dataset with a phylogenetic 
signal that is too weak to resolve this branch (see also 
Additional file 6). This may explain why proteins evol- 
ving with an intermediate rate provide the most robust 
tree. 

The finding of a relationship between the Laverania 
and avian malaria parasites agrees with earlier studies by 
Waters et al. [55], Escalante and Ayala [56], and Kis- 
singer et al. [57]. However, it contradicts more recent 
results by Perkins and Schall [58], Leclerc et al. [59], 
Roy and Irimia [60] and Martinsen et al. [61]. This dis- 
crepancy may be attributed to the limited phylogenetic 
information in the few proteins that were used in those 
studies [10]. While the selection of proteins may have 
some effect (see above), the number and choice of the 
outgroup taxa deserve particular attention (e.g., [62]). 
Alternative root placements lead to different conclusions 
about the order in which the individual Plasmodium 
species emerged (cf. Figure 2). In many previous stu- 
dies, only a single outgroup taxon was used (Table 1). 
Moreover, in some cases this outgroup was evolutiona- 
rily so distantly related that a meaningful placement of 
the root is unlikely (e.g., [63]). Most recent studies of 
Plasmodium phylogeny used selected species from the 
closely related genus Leucocytozoon as an outgroup (cf. 
Table 1). However, the limited amount of sequence data 
available for this taxon - mainly a few mitochondrial 
genes - currently prevents its use in phylogenomic stu- 
dies. Other haemosporidians (i.e., species from the gen- 
era Haemoproteus, Parahaemoproteus, and Hepatocystis) 
should not be considered as an outgroup since the 
genus Plasmodium has been shown to be paraphyletic 
with respect to these taxa (e.g., [61]). Alternative strate- 
gies for a reliable root placement employ the inclusion 
of multiple outgroup taxa to break the branch separat- 
ing the ingroup from the outgroup, and the use of a 
comprehensive set of proteins [64]. Our trees include 
four apicomplexan species as an outgroup and are based 
on 218 orthologous proteins. We have obtained identical 
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tree topologies by employing different tree reconstruction 
methods (ML and Bayesian inference) and different mod- 
els of sequence evolution. Moreover, our findings remain 
unchanged when we use proteins with different evolu- 
tionary rates. Ultimately, likelihood ratio tests rejected all 
alternative tree topologies. Thus, we are confident that 
our root placement is robust and that P. gallinaceum and 
the Laverania indeed share a common ancestry. 

An avian parasite as sister to the Laverania has signifi- 
cant implications: it suggests that a host switch from 
birds to African great apes or vice versa has occurred. 
Host switches have repeatedly taken place during the 
evolution of avian Plasmodium species [61]. Moreover, 
avian Plasmodium species are able to infect mammals 
under experimental conditions [65]. Both observations 
are congruent with an evolutionary scenario in which 
the laveranian lineage was established by a single Plas- 
modium species switching from birds to African great 
apes. Subsequent diversification of Laverania associated 
with multiple host switches within the apes eventually 
led to the emergence of P. falciparum in humans [5-9]. 
Note that this scenario also implies that the great diver- 
sity of malaria parasites infecting birds [61] may in fact 
derive from an early host switch by another mammalian 
Plasmodium species. At present, however, we cannot 
exclude the alternative scenario in which the avian Plas- 
modium lineage was established by a Plasmodium spe- 
cies from the laveranian lineage. Therefore, 
phylogenomic analyses considering additional Plasmo- 
dium species (and in particular those infecting birds and 
squamate reptiles) will be necessary to provide a more 
detailed picture of how the Laverania emerged. 

Evolutionary plasticity of the Plasmodium export element 

The availability of Plasmodium genome sequences 
together with the reliable reconstruction of their phylo- 
genetic relationships provides a robust framework to 
investigate the evolutionary history of exported P. falci- 
parum proteins. Here, we used 531 P. falciparum pro- 
teins that had been predicted to be exported into the 
host cell [19,20] to identify functionally equivalent 
orthologs in the other Plasmodium species. BLAST 
searches in the P. falciparum genome identified 102 
proteins without any recognizable paralog (Additional 
file 7), whereas the other 429 proteins mainly belong to 
large gene families such as RIFINs (repetitive inter- 
spersed family) and STEVORs (subtelomeric variable 
open reading frames). These gene families have a com- 
plex evolutionary history and have undergone indepen- 
dent lineage-specific diversifications [19]. This indicates 
that even if homologs of these proteins exist in the 
other Plasmodium species, they do not necessarily share 
the same function. These proteins were therefore 
excluded from further analyses. 



Subsequent BLAST searches for homologs of the 102 
paralog-free proteins in the other Plasmodium species 
identified 33 proteins with a homolog present in each of 
the species (Table 2). Orthology between the members 
in the 33 groups of proteins was confirmed by inferring 
the corresponding sequence trees with a Bayesian 
approach as described in the Methods section (Figure 5; 
Additional file 7). Whereas in 27 cases this tree was 
congruent to the species tree, six sequence trees differed 
from the species tree in the position of the P. gallina- 
ceum sequences. However, subsequent likelihood ratio 
tests revealed that superimposing the species tree did 
not lead to significantly worse likelihoods (Additional 
file 8). The pairwise similarities between the P. falci- 
parum proteins and their orthologs in the other Plasmo- 
dium species are given in Table 2. The orthologs from 
P. reichenowi display the highest degree of similarity. 
This finding is expected given the sister group relation- 
ship of P. falciparum and P. reichenowi. Among the 
remaining six non-laveranian taxa the orthologs from P. 
gallinaceum are overall most similar to the P. falci- 
parum proteins. This lends further support to our con- 
clusion that the Laverania and P. gallinaceum share a 
common ancestry. 

Both reciprocal best BLAST hit searches and phyloge- 
netic tree reconstructions indicate that the proteins in 
the 33 groups are encoded by genes that remained sin- 
gle copy throughout evolution (one-to-one orthologs). 
Ample evidence exists that such one-to-one orthologs 
are functionally equivalent [66]. Therefore, we conclude 
that if the P. falciparum protein is exported, its ortho- 
logs in other Plasmodium species are exported as well 
and hence, that these proteins are suitable to assess the 
evolutionary plasticity of the PEXEL motif. Note that 
five of these 33 proteins have already been confirmed to 
be exported in P. falciparum using GFP-constructs 
([20]; Table 2). However, five proteins appear not to be 
exported ([19,20]; Table 2); thus they were omitted from 
further analyses. 

The amino acid alignments of the remaining 28 ortho- 
log groups were used to identify the regions that corre- 
spond to the P. falciparum PEXEL in the sequences 
from the other species. Subsequently, the candidate 
PEXEL sequences from all proteins were extracted and 
aligned separately for each species (Additional file 8). 
From these alignments the individual PEXEL motifs 
were determined and compared to those of P. falci- 
parum (Figure 6). The motifs were found to be largely 
similar across the different Plasmodium species. How- 
ever, several deviations from the functionally important 
amino acids were observed; the amino acids at position 
1 and 3 are crucial for an efficient cleavage by plasmep- 
sin V, while the amino acid at position 5 affects the 
export rate of the nascent protein [67]. 
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Table 2 Exported P. falciparum proteins with one-to-one orthologs present in all Plasmodium species 
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Shown are the percent amino acid identities (similarities) of the P. falciparum protein to their orthologs in the other Plasmodium species. See also Additional file 8 
^Exported proteins originally predicted by 'Sargeant et al. [19] and 2 van Ooij et al. [20]; 3 found to be conserved within the genus Plasmodium [20]; 4 export of 
these proteins has been experimentally validated in P. falciparum [20]; experimental approaches by 5 Sargeant et al. [19] and 6 van Ooij et al. [20] suggest that 
these proteins in fact are not exported. Abbreviations: Pre, P. reichenowi; Pga, P. gallinaceum; Pkn, P. knowlesi; Pvi, P. vivax; Pch, P. chabaudi; Pyo, P. yoelii; Pbe, P. 
berghei. 



The most prominent difference between the Plasmo- 
dium species was found for the positively charged 
amino acid at position 1 of the PEXEL motif. All 28 P. 
falciparum proteins harbor an arginine (R), whereas 
about 20% of the proteins from non-Laverania have a 
lysine (K) at this position. Three lines of evidence indi- 
cate that this alternate PEXEL is nevertheless functional: 
(i) lysine at position 1 of the PEXEL was found in ortho- 
logs of those P. falciparum proteins whose export into 
the host cell has been confirmed (Figure 5), and thus 
our observation is not restricted to proteins that might 
have been erroneously predicted as being exported; (ii) 



recent experimental evidence suggests that the typical 
cleavage at the leucine (L) at position 3 can occur in 
proteins containing lysine at position 1 (PFI1780w and 
MAL3P8.15; [70]); and (iii) a small number of proteins 
with a lysine at position 1 of the PEXEL have already 
been predicted to be exported using a Hidden Markov 
Model based prediction method (21 in P. falciparum, 
three or less in each of the other Plasmodium species; 
cf. [19]). Other deviations at position 1 that are less pro- 
minent include the presence of histidine (H) in the P. 
knowlesi and P. vivax orthologs of PFC0435w and of 
glutamine (Q) in the P. gallinaceum protein that is 
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Figure 5 Phylogenies of exported P. falciparum proteins and 
orthologs. Sequence trees were inferred using a Bayesian 
approach; Bayesian posterior probabilities are given at the nodes 
and scale bars are equal to 0.1 substitutions per site. Putative PEXEL 
sequences are given next to each species (shaded in grey). Presence 
of a preceding hydrophobic signal sequence is indicated by a black 
box; absence (due to missing or incorrectly annotated N-terminal 
sequence data) is indicated by a grey box. Note that only proteins 
with confirmed export in P. falciparum are shown. Abbreviations: 
Pfa, P. falciparum; Pre, P. reichenowi; Pga, P. gallinaceum; Pkn, P. 
knowlesi; Pvi, P. vivax; Pch, P. chabaudi; Pyo, P. yoelii; Pbe, P. berghei. 
See also Additional file 8. 



orthologous to PFA0210c (Figure 5). Both PFC0435w 
and PFA0210c belong to the confirmed set of exported 
proteins in P. falciparum [20] and therefore these 
PEXEL sequences are likely to be functional as well. 
Position 3, which almost invariably harbors a hydropho- 
bic leucine (L), was also found almost invariable in the 
orthologs of the confirmed exported P. falciparum pro- 
teins. However, several orthologs of P. falciparum pro- 
teins that have not yet been confirmed to be exported 
have an isoleucine (I) at this position (Figure 6). Position 
5, which is considered to be the least conserved position 
[13,14], was found to be even more variable in the 
group of confirmed exported P. falciparum proteins. 

Even though it remains to be demonstrated that these 
orthologous proteins are cleaved and exported with the 
same efficiency, these observations suggest that the 
PEXEL motif is more variable than previously acknowl- 
edged. This provides a possible explanation for the 
small number of exported proteins predicted for some 
Plasmodium species. Taking this plasticity into account 
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Figure 6 Sequence logos representing the PEXEL motifs of 
eight Plasmodium species. Note that only the proteins presented 
in Table 2 were used to draw the individual PEXEL motifs. 
Phylogenetic relationships were drawn according to Figure 1. For 
details on how the PEXEL motif mediates protein export refer to the 
recent review by Goldberg and Cowman [21]. Abbreviations: Pfa, P. 
falciparum; Pre, P. reichenowi; Pga, P. gallinaceum; Pkn, P. knowlesi; 
Pvi, P. vivax; Pch, P. chabaudi; Pyo, P. yoelii, Pbe, P. berghei. See also 
Additional file 8. 



will be essential to arrive at a more comprehensive set 
of exported proteins for all Plasmodium species. 

Conclusion 

Our phylogenetic analyses of orthologs deduced from 
the Plasmodium genomes strongly suggests that the 
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subgenus Laverania was established by a single host switch 
from birds to African great apes (or vice versa). However, 
sequences from additional bird-infecting Plasmodium spe- 
cies and the closely related Haemosporida are required to 
better understand the early evolution of the Laverania. 
Exported proteins, as identified by the PEXEL motif, play 
a major role in Plasmodium virulence and facilitate the 
parasite's survival in the host cell. Our results suggest that 
the number of exported proteins is higher in the non- 
laveranian Plasmodium species than previously assumed. 
Comprehensive knowledge of their diversity and evolution 
will help to unravel the emergence of the high pathogeni- 
city of P. falciparum, and may allow the identification of 
novel targets for malaria therapy. 

Additional material 
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